Goto

Collaborating Authors

 Aerospace & Defense


af97b61e9fef8f55e32a2602af364d8c-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

The frame's origin is fixed at The motion equations are derived from Newton's second law for an air vehicle, resulting in six core The aerodynamic drag D, cross force C, and lift L account for the effects of external airflow. The inputs for the FPEs are the aircraft's attitude quaternion components along with the components The rigid-body kinematic equations (KEs) using the aircraft's attitude quaternion components [9] are The system comprising (CLMEs)-(CAMEs)-(FPEs)-(KEs), i.e., 1, 12, 15, and 16, represents The task scenarios can be categorized by objectives into Heading, Control, and Tracking. This work designs a hierarchical control algorithm for this task. RL Methods We use PPO for Heading and Control tasks in fixed-wing aircraft. The structure for hierarchical RL method is shown in Figure 10. The PPO algorithm's parameter settings are as follows: the learning rate is set to 3 10 "128 128", and the recurrent hidden layer size is 128 with a single recurrent layer.


NeuralPlane: An Efficiently Parallelizable Platform for Fixed-wing Aircraft Control with Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning (RL) demonstrates superior potential over traditional flight control methods for fixed-wing aircraft, particularly under extreme operational conditions. However, the high demand for training samples and the lack of efficient computation in existing simulators hinder its further application. In this paper, we introduce NeuralPlane, the first benchmark platform for large-scale parallel simulations of fixed-wing aircraft. NeuralPlane significantly boosts high-fidelity simulation via GPU-accelerated Flight Dynamics Model (FDM) computation, achieving a single-step simulation time of just 0.2 seconds at a parallel scale of 10


COSMIC: Compress Satellite Images Efficiently via Diffusion Compensation

Neural Information Processing Systems

With the rapidly increasing number of satellites in space and their enhanced capabilities, the amount of earth observation images collected by satellites is exceeding the transmission limits of satellite-to-ground links. Although existing learned image compression solutions achieve remarkable performance by using a sophisticated encoder to extract fruitful features as compression and using a decoder to reconstruct, it is still hard to directly deploy those complex encoders on current satellites' embedded GPUs with limited computing capability and power supply to compress images in orbit. In this paper, we propose COSMIC, a simple yet effective learned compression solution to transmit satellite images.


UAV3D: A Large-scale 3D Perception Benchmark for Unmanned Aerial Vehicles

Neural Information Processing Systems

Unmanned Aerial Vehicles (UAVs), equipped with cameras, are employed in numerous applications, including aerial photography, surveillance, and agriculture. In these applications, robust object detection and tracking are essential for the effective deployment of UAVs. However, existing benchmarks for UAV applications are mainly designed for traditional 2D perception tasks, restricting the development of real-world applications that require a 3D understanding of the environment. Furthermore, despite recent advancements in single-UAV perception, limited views of a single UAV platform significantly constrain its perception capabilities over long distances or in occluded areas. To address these challenges, we introduce UAV3D - a benchmark designed to advance research in both 3D and collaborative 3D perception tasks with UAVs. UAV3D comprises 1,000 scenes, each of which has 20 frames with fully annotated 3D bounding boxes on vehicles. We provide the benchmark for four 3D perception tasks: single-UAV 3D object detection, single-UAV object tracking, collaborative-UAV 3D object detection, and collaborative-UAV object tracking.


Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Neural Information Processing Systems

Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this paper, we propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Instead of relying on the multimodal LLM to directly annotate data, which we found to be suboptimal, we prompt it to reason about potential candidate entity labels by accessing additional contextually relevant information (such as Wikipedia), resulting in more accurate annotations. We further use the multimodal LLM to enrich the dataset by generating question-answer pairs and a grounded finegrained textual description (referred to as "rationale") that explains the connection between images and their assigned entities. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks (e.g.



Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

Neural Information Processing Systems

Fine-grained visual classification (FGVC) involves classifying closely related sub-classes. This task is difficult due to the subtle differences between classes and the high intra-class variance. Moreover, FGVC datasets are typically small and challenging to gather, thus highlighting a significant need for effective data augmentation. Recent advancements in text-to-image diffusion models offer new possibilities for augmenting classification datasets. While these models have been used to generate training data for classification tasks, their effectiveness in fulldataset training of FGVC models remains under-explored.


Physics-Driven ML-Based Modelling for Correcting Inverse Estimation

Neural Information Processing Systems

When deploying machine learning estimators in science and engineering (SAE) domains, it is critical to avoid failed estimations that can have disastrous consequences, e.g., in aero engine design. This work focuses on detecting and correcting failed state estimations before adopting them in SAE inverse problems, by utilizing simulations and performance metrics guided by physical laws. We suggest to flag a machine learning estimation when its physical model error exceeds a feasible threshold, and propose a novel approach, GEESE, to correct it through optimization, aiming at delivering both low error and high efficiency. The key designs of GEESE include (1) a hybrid surrogate error model to provide fast error estimations to reduce simulation cost and to enable gradient based backpropagation of error feedback, and (2) two generative models to approximate the probability distributions of the candidate states for simulating the exploitation and exploration behaviours. All three models are constructed as neural networks. GEESE is tested on three real-world SAE inverse problems and compared to a number of state-of-the-art optimization/search approaches. Results show that it fails the least number of times in terms of finding a feasible state correction, and requires physical evaluations less frequently in general.


Highlights From Starships Test Flight 9: Everything That Happened in 17 Minutes

Mashable

Highlights From Starship's Test Flight 9: Everything That Happened in 17 Minutes Mashable Tech Science Life Social Good Entertainment Deals Shopping Games Search Cancel * * Search Result Tech Apps & Software Artificial Intelligence Cybersecurity Cryptocurrency Mobile Smart Home Social Media Tech Industry Transportation All Tech Science Space Climate Change Environment All Science Life Digital Culture Family & Parenting Health & Wellness Sex, Dating & Relationships Sleep Careers Mental Health All Life Social Good Activism Gender LGBTQ Racial Justice Sustainability Politics All Social Good Entertainment Games Movies Podcasts TV Shows Watch Guides All Entertainment SHOP THE BEST Laptops Budget Laptops Dating Apps Sexting Apps Hookup Apps VPNs Robot Vaccuums Robot Vaccum & Mop Headphones Speakers Kindles Gift Guides Mashable Choice Mashable Selects All Sex, Dating & Relationships All Laptops All Headphones All Robot Vacuums All VPN All Shopping Games Product Reviews Adult Friend Finder Bumble Premium Tinder Platinum Kindle Paperwhite PS5 vs PS5 Slim All Reviews All Shopping Deals Newsletters VIDEOS Mashable Shows All Videos Home Science Space Highlights From Starship's Test Flight 9: Everything That Happened in 17 Minutes Starship Test Flight 9 ends with "confirmation that the booster did demise." By Mashable Video on May 28, 2025 Share on Facebook Share on Twitter Share on Flipboard Watch Next Qualcomm's 2025 Computex Highlights: Everything Announced in 20 Minutes 20:09 Everything Announced at AMD's 2025 Computex Keynote in 19 Minutes 19:31 Everything Revealed at Nvidia's 2025 Computex Press Conference in 19 Minutes 19:55 Microsoft Build 2025 keynote: Everything announced, in 14 minutes 14:43 SpaceX conducted its ninth test flight of the Starship Launch Vehicle atop a Falcon Heavy booster from Starbase, Texas. See all the highlights from the test launch. Topics SpaceX Elon Musk Rocket Launches Latest Videos'Good Fortune' trailer: Keanu Reeves plays a guardian angel in Aziz Ansari's directorial debut Keanu Reeves as an angel? And, well... 05/24/2025 By Leah Stodart Say More: R.L. Stine on'Fear Street: Prom Queen' and Matt Wolf on'Pee-wee as Himself' From teen-defining terror to a childhood icon, '80s nostalgia thrives.


AVOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator

Neural Information Processing Systems

Designing robust machine learning systems remains an open problem, and there is a need for benchmark problems that cover both environmental changes and evaluation on a downstream task. In this work, we introduce AVOIDDS, a realistic object detection benchmark for the vision-based aircraft detect-and-avoid problem. We provide a labeled dataset consisting of 72,000 photorealistic images of intruder aircraft with various lighting conditions, weather conditions, relative geometries, and geographic locations. We also provide an interface that evaluates trained models on slices of this dataset to identify changes in performance with respect to changing environmental conditions. Finally, we implement a fullyintegrated, closed-loop simulator of the vision-based detect-and-avoid problem to evaluate trained models with respect to the downstream collision avoidance task. This benchmark will enable further research in the design of robust machine learning systems for use in safety-critical applications.